{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Quantile analysis"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Install packages"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sys"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!{sys.executable} -m pip install -r requirements.txt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import cvxpy as cvx\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import time\n",
    "import os\n",
    "import quiz_helper\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "plt.style.use('ggplot')\n",
    "plt.rcParams['figure.figsize'] = (14, 8)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### data bundle"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import quiz_helper\n",
    "from zipline.data import bundles"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "os.environ['ZIPLINE_ROOT'] = os.path.join(os.getcwd(), '..', '..','data','module_4_quizzes_eod')\n",
    "ingest_func = bundles.csvdir.csvdir_equities(['daily'], quiz_helper.EOD_BUNDLE_NAME)\n",
    "bundles.register(quiz_helper.EOD_BUNDLE_NAME, ingest_func)\n",
    "print('Data Registered')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Build pipeline engine"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from zipline.pipeline import Pipeline\n",
    "from zipline.pipeline.factors import AverageDollarVolume\n",
    "from zipline.utils.calendars import get_calendar\n",
    "\n",
    "universe = AverageDollarVolume(window_length=120).top(500) \n",
    "trading_calendar = get_calendar('NYSE') \n",
    "bundle_data = bundles.load(quiz_helper.EOD_BUNDLE_NAME)\n",
    "engine = quiz_helper.build_pipeline_engine(bundle_data, trading_calendar)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### View Data¶\n",
    "With the pipeline engine built, let's get the stocks at the end of the period in the universe we're using. We'll use these tickers to generate the returns data for the our risk model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "universe_end_date = pd.Timestamp('2016-01-05', tz='UTC')\n",
    "\n",
    "universe_tickers = engine\\\n",
    "    .run_pipeline(\n",
    "        Pipeline(screen=universe),\n",
    "        universe_end_date,\n",
    "        universe_end_date)\\\n",
    "    .index.get_level_values(1)\\\n",
    "    .values.tolist()\n",
    "    \n",
    "universe_tickers"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Get Returns data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from zipline.data.data_portal import DataPortal\n",
    "\n",
    "data_portal = DataPortal(\n",
    "    bundle_data.asset_finder,\n",
    "    trading_calendar=trading_calendar,\n",
    "    first_trading_day=bundle_data.equity_daily_bar_reader.first_trading_day,\n",
    "    equity_minute_reader=None,\n",
    "    equity_daily_reader=bundle_data.equity_daily_bar_reader,\n",
    "    adjustment_reader=bundle_data.adjustment_reader)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Get pricing data helper function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_pricing(data_portal, trading_calendar, assets, start_date, end_date, field='close'):\n",
    "    end_dt = pd.Timestamp(end_date.strftime('%Y-%m-%d'), tz='UTC', offset='C')\n",
    "    start_dt = pd.Timestamp(start_date.strftime('%Y-%m-%d'), tz='UTC', offset='C')\n",
    "\n",
    "    end_loc = trading_calendar.closes.index.get_loc(end_dt)\n",
    "    start_loc = trading_calendar.closes.index.get_loc(start_dt)\n",
    "\n",
    "    return data_portal.get_history_window(\n",
    "        assets=assets,\n",
    "        end_dt=end_dt,\n",
    "        bar_count=end_loc - start_loc,\n",
    "        frequency='1d',\n",
    "        field=field,\n",
    "        data_frequency='daily')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## get pricing data into a dataframe"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "returns_df = \\\n",
    "    get_pricing(\n",
    "        data_portal,\n",
    "        trading_calendar,\n",
    "        universe_tickers,\n",
    "        universe_end_date - pd.DateOffset(years=5),\n",
    "        universe_end_date)\\\n",
    "    .pct_change()[1:].fillna(0) #convert prices into returns\n",
    "\n",
    "returns_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Sector data helper function\n",
    "We'll create an object for you, which defines a sector for each stock.  The sectors are represented by integers.  We inherit from the Classifier class.  [Documentation for Classifier](https://www.quantopian.com/posts/pipeline-classifiers-are-here), and the [source code for Classifier](https://github.com/quantopian/zipline/blob/master/zipline/pipeline/classifiers/classifier.py)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from zipline.pipeline.classifiers import Classifier\n",
    "from zipline.utils.numpy_utils import int64_dtype\n",
    "class Sector(Classifier):\n",
    "    dtype = int64_dtype\n",
    "    window_length = 0\n",
    "    inputs = ()\n",
    "    missing_value = -1\n",
    "\n",
    "    def __init__(self):\n",
    "        self.data = np.load('../../data/project_4_sector/data.npy')\n",
    "\n",
    "    def _compute(self, arrays, dates, assets, mask):\n",
    "        return np.where(\n",
    "            mask,\n",
    "            self.data[assets],\n",
    "            self.missing_value,\n",
    "        )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sector = Sector()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## We'll use 2 years of data to calculate the factor"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Note:** Going back 2 years falls on a day when the market is closed. Pipeline package doesn't handle start or end dates that don't fall on days when the market is open. To fix this, we went back 2 extra days to fall on the next day when the market is open."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "factor_start_date = universe_end_date - pd.DateOffset(years=2, days=2)\n",
    "factor_start_date"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create smoothed momentum factor"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from zipline.pipeline.factors import Returns\n",
    "from zipline.pipeline.factors import SimpleMovingAverage\n",
    "\n",
    "\n",
    "# create a pipeline called p\n",
    "p = Pipeline(screen=universe)\n",
    "# create a factor of one year returns, deman by sector, then rank\n",
    "factor = (\n",
    "    Returns(window_length=252, mask=universe).\n",
    "    demean(groupby=Sector()). #we use the custom Sector class that we reviewed earlier\n",
    "    rank().\n",
    "    zscore()\n",
    ")\n",
    "\n",
    "\n",
    "# Use this factor as input into SimpleMovingAverage, with a window length of 5\n",
    "# Also rank and zscore (don't need to de-mean by sector, s)\n",
    "factor_smoothed = (\n",
    "    SimpleMovingAverage(inputs=[factor], window_length=5).\n",
    "    rank().\n",
    "    zscore()\n",
    ")\n",
    "\n",
    "# add the unsmoothed factor to the pipeline\n",
    "p.add(factor, 'Momentum_Factor')\n",
    "# add the smoothed factor to the pipeline too\n",
    "p.add(factor_smoothed, 'Smoothed_Momentum_Factor')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## visualize the pipeline\n",
    "\n",
    "Note that if the image is difficult to read in the notebook, right-click and view the image in a separate tab."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "p.show_graph(format='png')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## run pipeline and view the factor data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = engine.run_pipeline(p, factor_start_date, universe_end_date)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Evaluate Factors\n",
    "\n",
    "We'll go over some tools that we can use to evaluate alpha factors.  To do so, we'll use the [alphalens library](https://github.com/quantopian/alphalens)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Import alphalens"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import alphalens as al"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Get price data\n",
    "\n",
    "Note, we already got the price data and converted it to returns, which we used to calculate a factor.  We'll retrieve the price data again, but won't convert these to returns.  This is because we'll use alphalens functions that take their input as prices and not returns.\n",
    "\n",
    "## Define the list of assets\n",
    "Just to make sure we get the prices for the stocks that have factor values, we'll get the list of assets, which may be a subset of the original universe"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# get list of stocks in our portfolio (tickers that identify each stock)\n",
    "assets = df.index.levels[1].values.tolist()\n",
    "print(f\"stock universe number of stocks {len(universe_tickers)}, and number of stocks for which we have factor values {len(assets)}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "factor_start_date"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pricing = get_pricing(\n",
    "        data_portal,\n",
    "        trading_calendar,\n",
    "        assets, #notice that we used assets instead of universe_tickers; in this example, they're the same\n",
    "        factor_start_date, # notice we're using the same start and end dates for when we calculated the factor\n",
    "        universe_end_date)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "factor_names = df.columns\n",
    "print(f\"The factor names are {factor_names}\")\n",
    "\n",
    "# Use a dictionary to store each dataframe, one for each factor and its associated forward returns\n",
    "factor_data = {}\n",
    "for factor_name in factor_names:\n",
    "    print(\"Formatting factor data for: \" + factor_name)\n",
    "    # Get clean factor and forward returns for each factor\n",
    "    # Choose single period returns (daily returns)\n",
    "    factor_data[factor_name] = al.utils.get_clean_factor_and_forward_returns(\n",
    "        factor=df[factor_name],\n",
    "        prices=pricing,\n",
    "        periods=[1])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## quantile analysis\n",
    "\n",
    "Alphalens [mean_return_by_quantile documentation.](https://quantopian.github.io/alphalens/alphalens.html#alphalens.performance.mean_return_by_quantile)\n",
    "\n",
    "```\n",
    "alphalens.performance.mean_return_by_quantile(factor_data,\n",
    "                                              ...\n",
    "                                              demeaned=True,\n",
    "                                              ...)\n",
    "```\n",
    "* factor_data: A MultiIndex DataFrame indexed by date (level 0) and asset (level 1), containing the values for a single alpha factor, forward returns for each period, the factor quantile/bin that factor value belongs to, and (optionally) the group the asset belongs to\n",
    "* demeaned: this is True by default.  This makes a call to [alphalens.utils.demean_forward_returns](https://quantopian.github.io/alphalens/alphalens.html#alphalens.utils.demean_forward_returns), which takes each factor return and subtracts the mean of all the factor returns in our universe of stocks.\n",
    "* returns:mean_ret : pd.DataFrame: Mean period wise returns by specified factor quantile.\n",
    "* Note that it returns a second variable, standard error of the returns.  We will focus on the mean return."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Quiz 1\n",
    "Look at the error message when trying to call the mean_return_by_quantile function, passing in the factor data.  What data type is required?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "factor_names = df.columns\n",
    "ls_fra = []\n",
    "\n",
    "for i, factor_name in enumerate(factor_names):\n",
    "    print(\"Calculating the FRA for: \" + factor_name)\n",
    "    \n",
    "    #TODO: look at the error generated from this line of code\n",
    "    quantile_return, quantile_stderr = al.performance.mean_return_by_quantile(factor_data[factor_name])\n",
    "    quantile_return.columns = [factor_name]\n",
    "    qr_factor_returns.append(quantile_return)\n",
    "    \n",
    "df_ls_fra = pd.concat(ls_fra, axis=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Answer 1 here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Convert datetime to integer\n",
    "\n",
    "To pass in factor data that the factor_rank_autocorrelation function can use, we'll convert the datetime into an integer using unix"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "unixt_factor_data = {}\n",
    "for factor_name in factor_names:\n",
    "    unixt_index_data = [(x.timestamp(), y) for x, y in factor_data[factor_name].index.values]\n",
    "    unixt_factor_data[factor_name] = factor_data[factor_name].set_index(pd.MultiIndex.from_tuples(unixt_index_data, names=['date', 'asset']))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Quiz 2: \n",
    "Calculate the quantile returns.\n",
    "\n",
    "Use the data for which the datetime index was converted to integer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "factor_names = df.columns\n",
    "ls_qr = []\n",
    "\n",
    "for i, factor_name in enumerate(factor_names):\n",
    "    print(\"Calculating the FRA for: \" + factor_name)\n",
    "    \n",
    "    #TODO: calculate quantile returns and standard errors\n",
    "    # store the quantile returns\n",
    "    quantile_return, quantile_stderr = # ...\n",
    "    quantile_return.columns = [factor_name]\n",
    "    ls_qr.append(quantile_return)\n",
    "df_ls_qr = pd.concat(ls_qr, axis=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## View the outputted FRA"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_ls_qr.plot.bar(\n",
    "    subplots=True,\n",
    "    sharey=True,\n",
    "    layout=(4,2),\n",
    "    figsize=(14, 14),\n",
    "    legend=False,\n",
    "    title='Alphas Comparison: Per Day per Quantile'\n",
    ");"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Quiz 3\n",
    "How would you compare the quantile returns of the unsmoothed and smoothed factors? Which one would you prefer based on just the quantile returns?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Answer 3 here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Quiz 4 basis points\n",
    "\n",
    "Notice how the y-axis has pretty small numbers for the percentages.  We normally use basis points as the unit of measurement.  To convert from decimal (e.g. 0.01 is one percent) to basis points, multiply by $10^4$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Answer 4"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# TODO: convert values to basis points\n",
    "df_ls_qr_bp = # ...\n",
    "df_ls_qr_bp"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## re-plot using basis point scaling"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_ls_qr_bp.plot.bar(\n",
    "    subplots=True,\n",
    "    sharey=True,\n",
    "    layout=(4,2),\n",
    "    figsize=(14, 14),\n",
    "    legend=False,\n",
    "    title='Alphas Comparison: Per Day per Quantile (basis points)'\n",
    ");"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Solution notebook\n",
    "[The solution notebook is here.](quantiles_solution.ipynb)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
